Swift: Model withUnsafeBytes and similar closure methods#13827
Swift: Model withUnsafeBytes and similar closure methods#13827MathiasVP merged 23 commits intogithub:mainfrom
Conversation
…so define models there as well.
|
With the content flow improvements in |
|
(DCA LGTM) |
|
Should those really all be taint steps? I wonder if we could do value flow if we flowed through the UnsafeByte pointers as |
|
Examples such as this (from https://developer.apple.com/documentation/swift/array/withunsafebufferpointer(_:)) certainly makes it seem like we should be using some kind of let numbers = [1, 2, 3, 4, 5]
let sum = numbers.withUnsafeBufferPointer { buffer -> Int in
var result = 0
for i in stride(from: buffer.startIndex, to: buffer.endIndex, by: 2) {
result += buffer[i]
}
return result
}it sounds like |
Oh I see what you mean - though the container itself is not the same (it's a collection outside, but a pointer/buffer inside the closure), the pointed-to elements are identical to those in the original container (at least if the pointer / buffer is typed - so the exception would be any cases that convert to a so-called "Raw" pointer, which should just be taint flow).
I think this PR was affected by not having |
|
Updated:
There's quite a lot going on in this PR now. I'll run DCA again to be safe. |
|
Just pushed a fix for the UnsafeJsEval test failure. The fix isn't perfect, because we have an ad-hoc model of |
|
The DCA repeat-run looks fine. The PR checks all pass now. I would ideally like to merge this PR soon and before it gets any bigger... |
MathiasVP
left a comment
There was a problem hiding this comment.
There's quite a lot of nice to have things as follow-ups, but they're all planned as future work. So in the spirit of not having this PR linger for any longer I'll just merge it now ![]()
| ( | ||
| subscript.getBase().getType() instanceof ArrayType and | ||
| c.isSingleton(any(Content::ArrayContent ac)) | ||
| or | ||
| c.isSingleton(any(Content::CollectionContent ac)) | ||
| ) |
There was a problem hiding this comment.
I hope that a future PR can either collapse ArrayContent and CollectionContent into a single IPA branch, or that we can ensure that a given read/store step always reads a single Content. But obviously, that's not for this PR to handle. So this LGTM for now
| * - elements of collections, such as `Set<Element>`. | ||
| * - elements of buffers, such as `UnsafeBufferPointer<Element>`. | ||
| * - the pointee of a pointer, such as `UnsafePointer<Pointee>`. |
There was a problem hiding this comment.
It may be a small performance win to split up these three things into separate Contents in the future. But we can delay that for now, obviously.
There was a problem hiding this comment.
UnsafeBufferPointer is in fact yet another type of Collection, so it falls into the existing discussion about merging or splitting collection content types properly.
UnsafePointer appears to not be a collection though. I'll create an issue for giving it its own content type.
| ( | ||
| isSink(node) | ||
| or | ||
| isAdditionalFlowStep(node, _) |
There was a problem hiding this comment.
Hmmm.... Why do we need implicit read steps at the additional flow steps, and not just at the sink? I assume this is related to the ad-hoc model you had to provide for .baseAddress?
There was a problem hiding this comment.
Yes I think that was why, and there's an issue for turning that ad-hoc model into a general one and getting rid of the remaining special cases here.
| c.getAReadContent() instanceof DataFlow::Content::ArrayContent or | ||
| c.getAReadContent() instanceof DataFlow::Content::CollectionContent |
There was a problem hiding this comment.
Once again, I really hope we can merge these eventually. We're straining the dataflow library quite a lot with these duplications (i.e., if we have a path that should have an access path like [a, CollectionContent, b, CollectionContent] we're really forcing the dataflow library to have:
[Field[a], CollectionContent, Field[b], CollectionContent]
[Field[a], CollectionContent, Field[b], ArrayContent]
[Field[a], ArrayContent, Field[b], CollectionContent]
[Field[a], ArrayContent, Field[b], ArrayContent]
|
Thanks for reviewing and merging! |
(draft PR because this is currently sitting atop an early version of #13741 that is necessary to get some of the results; please ignore those commits, they will be rebased away once that PR is merged)
Model
withUnsafeBytesand some similar closure methods.Limitations:
.ArrayElementto access content when we know theCollectionis anArray. I've used two models - one with.ArrayElementand one with no content specifier - when the object could be anArrayor another type ofCollection. I've used no content specifier for pointers (UnsafePointeretc) since we don't have content for any of those (yet). Thus, some level of conflation (e.g. betweenptrandptr[0]) is inevitable at this stage. We should be able to improve accuracy as we implement further types of content flow..baseAddress) are not yet modelled. There is an in-progress issue for these.I'm not sure why taint from array elements isn't always flowing into closures, e.g. inint.swiftline 19. It works if the qualifier is a tainted array, but not if it's an array with tainted content as in the test (despite there being models for both cases).very likely addresseshas addressed this.TODO:
maywill also benefit from 098db58 (for missing test results such asint.swiftline 19)